feat(kernels): add TVM FFI Triton CUBIN wrapper by Ma1oneZhang · Pull Request #315 · openinfer-project/openinfer

Ma1oneZhang · 2026-06-09T07:35:59Z

Summary

Adds the MVP production-facing TVM FFI path for Triton AOT CUBIN launchers in pegainfer-kernels.

This makes tvm-ffi a required pegainfer-kernels dependency, exposes a packed TVM FFI global wrapper for the generated Qwen3.5 gated-delta-rule Triton launcher, and adds a small example showing registration/discovery of the wrapper ABI.

Changes

Add required tvm-ffi dependency to pegainfer-kernels.
Add pegainfer_kernels::triton_cubin with a TVM FFI packed wrapper for gated_delta_rule_prefill_chunk_solve_cuda.
Add the triton_cubin_tvm_ffi example for registering and inspecting the MVP wrapper.
Document the Triton CUBIN TVM FFI boundary and record the project-doc execution notes.

Validation

cargo fmt --all --check
cargo metadata --no-deps --format-version 1
cargo check --release -p pegainfer-kernels was run after adding /home/ziyang/gpu_memory_profiling/.venv/bin to PATH so tvm-ffi-config could be found. tvm-ffi built successfully, then the existing CUDA build failed in pegainfer-kernels/csrc/shared/flashinfer_top1.cu because the local dirty FlashInfer submodule exposes a different TopKDispatch API. The submodule change is not part of this PR.

Code Review

This pull request introduces tvm-ffi as a required dependency for pegainfer-kernels and implements a TVM FFI wrapper layer for Triton AOT CUBIN launchers, specifically exposing the Qwen3.5 gated-delta-rule triangular solve launcher. It also adds corresponding documentation, tests, and an example. Feedback on the implementation highlights a critical runtime issue in triton_cubin.rs where integer arguments are extracted directly as u64 or i32 using AnyView::try_as. Because TVM FFI packed integer arguments are always passed as i64, these strict type-matching checks will fail at runtime. The reviewer recommends extracting them as i64 first before casting.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-09T07:37:19Z

+fn arg_handle(args: &[AnyView<'_>], spec: TritonCubinFunctionSpec, idx: usize) -> TvmResult<usize> {
+    let value = args
+        .get(idx)
+        .ok_or_else(|| type_error(spec, idx, "an integer or opaque pointer"))?;
+    if let Some(raw) = value.try_as::<u64>() {
+        return Ok(raw as usize);
+    }
+    if let Some(raw) = value.try_as::<*mut c_void>() {
+        return Ok(raw as usize);
+    }
+    Err(type_error(spec, idx, "an integer or opaque pointer"))
+}


In TVM FFI, packed function arguments of integer type are always passed as signed 64-bit integers (i64). Since AnyView::try_as performs strict type-code matching, attempting to extract them using try_as::<u64>() will fail at runtime with a type error. To ensure robustness, we should attempt to extract the value as i64 first before trying u64 or *mut c_void.

fn arg_handle(args: &[AnyView<'_>], spec: TritonCubinFunctionSpec, idx: usize) -> TvmResult<usize> { let value = args .get(idx) .ok_or_else(|| type_error(spec, idx, "an integer or opaque pointer"))?; if let Some(raw) = value.try_as::<i64>() { return Ok(raw as usize); } if let Some(raw) = value.try_as::<u64>() { return Ok(raw as usize); } if let Some(raw) = value.try_as::<*mut c_void>() { return Ok(raw as usize); } Err(type_error(spec, idx, "an integer or opaque pointer")) }

gemini-code-assist · 2026-06-09T07:37:19Z

+fn arg_i32(args: &[AnyView<'_>], spec: TritonCubinFunctionSpec, idx: usize) -> TvmResult<i32> {
+    args.get(idx)
+        .and_then(AnyView::try_as::<i32>)
+        .ok_or_else(|| type_error(spec, idx, "an integer"))
+}


Similar to arg_handle, scalar integer arguments passed from Python/TVM are represented as i64 in the TVM FFI packed call. Calling try_as::<i32>() directly will fail because of strict type-code matching. We should extract the value as i64 first and then cast it to i32.

fn arg_i32(args: &[AnyView<'_>], spec: TritonCubinFunctionSpec, idx: usize) -> TvmResult<i32> { let value = args .get(idx) .ok_or_else(|| type_error(spec, idx, "an integer"))?; if let Some(raw) = value.try_as::<i64>() { return Ok(raw as i32); } if let Some(raw) = value.try_as::<i32>() { return Ok(raw); } Err(type_error(spec, idx, "an integer")) }

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: eed6eb7505

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-09T07:40:11Z

+fn arg_i32(args: &[AnyView<'_>], spec: TritonCubinFunctionSpec, idx: usize) -> TvmResult<i32> {
+    args.get(idx)
+        .and_then(AnyView::try_as::<i32>)
+        .ok_or_else(|| type_error(spec, idx, "an integer"))


Accept TVM i64 integer arguments

This path rejects the normal TVM integer representation for seq_len and num_value_heads: AnyView::try_as::<i32>() is an exact type check rather than a narrowing conversion, while TVM FFI integer values commonly arrive as i64 (including from Python/TVM typed calls). In that environment the new global function returns a type error before launching the CUBIN even for valid dimensions; parse i64/u64 and range-check before casting to i32.

Useful? React with 👍 / 👎.

Ma1oneZhang · 2026-06-09T11:42:22Z

Added test coverage in 0c55194 for the TVM FFI Triton CUBIN wrapper:

known/unknown wrapper lookup
TVM FFI global registry round-trip
accepted raw handle encodings (u64 and opaque pointer)
missing-argument diagnostics before launch
handle/scalar type diagnostics before launch

Validation:

cargo fmt --all --check passed
PATH=/home/ziyang/gpu_memory_profiling/.venv/bin:$PATH cargo test --release -p pegainfer-kernels triton_cubin --lib built tvm-ffi, then failed before Rust test execution in the existing CUDA build at pegainfer-kernels/csrc/shared/flashinfer_top1.cu because the local dirty FlashInfer submodule has a TopKDispatch API mismatch. That submodule change remains outside this PR.

xiaguan

Thanks for the patch. I like the direction of exposing these CUBIN launchers through TVM FFI, but I do not think tvm-ffi should become a required dependency of pegainfer-kernels yet.

Please put this behind a feature gate, or move it into a small optional bridge crate like pegainfer-kernels-tvm, so normal kernel builds do not require tvm-ffi-config / libtvm_ffi.

Also, the added tests currently do not compile locally:

PATH=/data/code/workspace-rustllm/pegainfer/.venv/bin:$PATH PEGAINFER_CUDA_SM=120 PEGAINFER_TRITON_PYTHON=/data/code/workspace-rustllm/pegainfer/.venv/bin/python cargo test --release -p pegainfer-kernels triton_cubin --lib -- --nocapture

fails with:

error[E0277]: tvm_ffi::Any doesn't implement Debug

This comes from expect_err(...) on call_packed(...); expect_err requires the success type to implement Debug. Please fix the tests, e.g. by matching on the Result.

Requesting changes until these are addressed.

Ma1oneZhang · 2026-06-09T13:25:24Z

Addressed xiaguan's requested changes in 0228109:

Made tvm-ffi optional behind pegainfer-kernels/tvm-ffi-triton-cubin; normal pegainfer-kernels builds no longer require tvm-ffi-config or libtvm_ffi.
Gated pegainfer_kernels::triton_cubin with that feature and marked the example with required-features.
Replaced test expect_err(...) calls with explicit Result matching so tvm_ffi::Any: Debug is not required.
Also fixed the packed integer handling noted by automated inline reviews: pointer handles and scalar launch dimensions now accept TVM FFI i64 integers with range checks before casting.

Validation:

cargo fmt --all --check passed
cargo metadata --no-deps --format-version 1 passed
cargo tree -p pegainfer-kernels -e normal --no-default-features --depth 1 shows no tvm-ffi
cargo tree -p pegainfer-kernels -e normal --features tvm-ffi-triton-cubin --depth 1 shows tvm-ffi only with the bridge feature
cargo check --release -p pegainfer-kernels no longer needs tvm-ffi-config, then still stops at the pre-existing dirty FlashInfer flashinfer_top1.cu / TopKDispatch mismatch in this checkout
PATH=/home/ziyang/gpu_memory_profiling/.venv/bin:$PATH cargo test --release -p pegainfer-kernels --features tvm-ffi-triton-cubin triton_cubin --lib -- --nocapture stops at the same CUDA build-script mismatch before Rust tests run locally

xiaguan · 2026-06-09T13:41:16Z

@codex review

chatgpt-codex-connector · 2026-06-09T13:44:45Z

Codex Review: Didn't find any major issues. Chef's kiss.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

xiaguan · 2026-06-09T13:46:41Z

rebase or merge master?

feat(kernels): add tvm ffi triton cubin wrapper

eed6eb7

gemini-code-assist Bot reviewed Jun 9, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Jun 9, 2026

View reviewed changes

test(kernels): cover tvm ffi triton cubin wrapper

0c55194

xiaguan requested changes Jun 9, 2026

View reviewed changes

fix(kernels): gate tvm ffi triton cubin bridge

0228109

xiaguan mentioned this pull request Jun 9, 2026

chore: rename pegainfer → openinfer across the workspace #318

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(kernels): add TVM FFI Triton CUBIN wrapper#315

feat(kernels): add TVM FFI Triton CUBIN wrapper#315
Ma1oneZhang wants to merge 3 commits into
openinfer-project:mainfrom
Ma1oneZhang:feat/tvm-ffi-triton-cubin

Ma1oneZhang commented Jun 9, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 9, 2026

Uh oh!

gemini-code-assist Bot Jun 9, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 9, 2026

Uh oh!

Ma1oneZhang commented Jun 9, 2026

Uh oh!

xiaguan left a comment

Uh oh!

Ma1oneZhang commented Jun 9, 2026

Uh oh!

xiaguan commented Jun 9, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 9, 2026

Uh oh!

xiaguan commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Ma1oneZhang commented Jun 9, 2026

Summary

Changes

Validation

Related

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

Ma1oneZhang commented Jun 9, 2026

Uh oh!

xiaguan left a comment

Choose a reason for hiding this comment

Uh oh!

Ma1oneZhang commented Jun 9, 2026

Uh oh!

xiaguan commented Jun 9, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 9, 2026

Uh oh!

xiaguan commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants